{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Can OpenAI's new prompt caching feature boost the effectiveness of RAG applications?\n",
        "\n",
        "\n",
        "✅ **Prompt caching** is automatically applied by OpenAI to detect and reuse identical prompts previously sent to the API. This allows the system to use cached prompts rather than reprocessing similar ones from scratch.\n",
        "\n",
        "\n",
        "✅ The **benefits** from using Prompt caching include \"reducing latency (up to 80%) and lower costs (up to 50%) for longer prompts\".\n",
        "\n",
        "✅ **Constraints**: However, for caching to work, prompts must be at least 1024 tokens and should have static content such as instructions and examples at the beginning, with variable content at the end for consistent cache hits.\n",
        "This caching remains active for 5 to 10 min of inactivity, and it can persist up to one hour during off-peak period.\n",
        "\n",
        "✅ **Cache Hits** occurs when the system finds a matching prompt prefix, enabling caching. In contrast, a **Cache Miss** happens when no matching prefix is found, requiring the prompt to be processed from scratch.\n",
        "\n",
        "When there is no cache hits (either it's your first call or simply no similarity found) the number of caching tokens is equal to 0.\n",
        "\n",
        "You can find this value in the completion response object returned by the API.\n",
        "\n",
        "\n",
        "▶ **Can prompt caching work in RAG apps? Key Takeaways:**\n",
        "\n",
        "🔽 ⬇ Have a look at the end of the notebook 🔽 ⬇\n",
        "\n",
        "▶ **To explore prompt caching in a RAG workflow:**\n",
        "\n",
        "- I analyzed Amazon’s 10-K report using the LlamaIndex framework.\n",
        "- A simple reader/parser was used to extract data from the financial report.\n",
        "- Instead of the query engine, built upon the vectore store, I directly accessed the template prompts generated by LlamaIndex.\n",
        "- I used this template, placing the retrieved context first and the user query at the end.\n",
        "- The calls used OpenAI’s GPT-4o-mini.\n",
        "- I gathered final answers, cached token counts, and calculated total tokens sent in each prompt.\n",
        "\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "\n",
        "\n",
        "🔽 Discover the whole process : 🔽"
      ],
      "metadata": {
        "id": "7eexMPlQmx2Z"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "[Hanane DUPOUY](https://www.linkedin.com/in/hanane-d-algo-trader)"
      ],
      "metadata": {
        "id": "9N5qiekYnWEP"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rZNq9CDo3cON"
      },
      "outputs": [],
      "source": [
        "!pip install llama-index llama-index-core openai llama_index.embeddings.huggingface -q"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import nest_asyncio\n",
        "nest_asyncio.apply()\n",
        "\n",
        "from google.colab import userdata\n",
        "OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')\n",
        "LLAMAPARSE_API_KEY = userdata.get('LLAMACLOUD_API_KEY')"
      ],
      "metadata": {
        "id": "eMaoyHO5302b"
      },
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!wget \"https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf\" -O amzn_2023_10k.pdf"
      ],
      "metadata": {
        "id": "58mGZqKe39eC"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# VectorStore with embedding and SimpleDirectoryReader"
      ],
      "metadata": {
        "id": "zLeIqHp9LEDB"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# from llama_parse import LlamaParse\n",
        "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
        "import nest_asyncio;\n",
        "nest_asyncio.apply()\n",
        "\n",
        "pdf_name = \"amzn_2023_10k.pdf\"\n",
        "# use SimpleDirectoryReader to parse our file\n",
        "documents = SimpleDirectoryReader(input_files=[pdf_name]).load_data()\n",
        "\n",
        "embed_model = \"local:BAAI/bge-small-en-v1.5\" #https://huggingface.co/collections/BAAI/bge-66797a74476eb1f085c7446d\n",
        "\n",
        "vector_index_std = VectorStoreIndex(documents, embed_model = embed_model)\n",
        "# chunk size 1048 per default, chunk_overlap = 40"
      ],
      "metadata": {
        "id": "MnGwkAvq4Fay"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Compute tokens in the retrieved documents"
      ],
      "metadata": {
        "id": "VPFKoPRpLJFJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import tiktoken"
      ],
      "metadata": {
        "id": "UAvXa9cCBe4I"
      },
      "execution_count": 6,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "encoding = tiktoken.encoding_for_model(\"gpt-4o-mini\")\n",
        "print(encoding)\n",
        "\n",
        "idx_keys = vector_index_std.storage_context.vector_stores['default'].data.embedding_dict.keys()\n",
        "for key in idx_keys:\n",
        "  text = vector_index_std.docstore.get_node(key).get_text()\n",
        "  tokens_integer=encoding.encode(text)\n",
        "  print(len(tokens_integer))"
      ],
      "metadata": {
        "collapsed": true,
        "id": "KGua6Of0CHcb"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# #To modify the chunking size\n",
        "# from llama_index.core import Settings\n",
        "# from llama_index.core.node_parser import SentenceSplitter\n",
        "\n",
        "# Settings.text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)"
      ],
      "metadata": {
        "id": "AEvdL7nR91fa"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## LlamaIndex Prompts Template:"
      ],
      "metadata": {
        "id": "mYEtspapKLvh"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "▶ To understand how the templates are built in llamaIndex, you can try this method (or go directly to the documentation):\n",
        "\n",
        "I created a query engine, built upon the vectore store index:"
      ],
      "metadata": {
        "id": "B4dmN6LQU0zJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from llama_index.llms.openai import OpenAI\n",
        "llm_gpt4o_mini = OpenAI(model=\"gpt-4o-mini\", api_key = OPENAI_API_KEY)\n",
        "query_engine_gpt4o_mini = vector_index_std.as_query_engine(similarity_top_k=3, llm=llm_gpt4o_mini)"
      ],
      "metadata": {
        "id": "9H3XATYYF3tB"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Calling the LLM here isn't necessary, but I’m doing it to verify the reliability of the chunking.\n",
        "query1 = \"What was the net income in 2023?\"\n",
        "response = query_engine_gpt4o_mini.query(query1)\n",
        "print(str(response))"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "YOfX8A3XGTDk",
        "outputId": "97831cbf-db81-42bd-c43d-5c031a875795"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The net income for 2023 is not explicitly provided in the context information. However, the income (loss) before income taxes for 2023 is reported as $37,557 million, and the provision for income taxes is $7,120 million. To determine the net income, one would typically subtract the provision for income taxes from the income before income taxes. Therefore, the net income for 2023 can be calculated as follows:\n",
            "\n",
            "Net Income = Income (loss) before income taxes - Provision for income taxes\n",
            "Net Income = $37,557 million - $7,120 million = $30,437 million. \n",
            "\n",
            "Thus, the net income for 2023 is approximately $30,437 million.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "▶ From the query engine, you can get the prompts templates used by LlamaIndex for the QA and the refine answer. We'll use only the **text_QA_template**:"
      ],
      "metadata": {
        "id": "e27_ZnmHUZlo"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "query_engine_gpt4o_mini.get_prompts().keys()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "7PhCowZMG0uq",
        "outputId": "9088a23f-b4c4-4e14-ba72-8d4cc769c9a4"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "dict_keys(['response_synthesizer:text_qa_template', 'response_synthesizer:refine_template'])"
            ]
          },
          "metadata": {},
          "execution_count": 64
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### text_qa_template"
      ],
      "metadata": {
        "id": "9jZJI7wdKQMJ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "query_engine_gpt4o_mini.get_prompts()['response_synthesizer:text_qa_template']"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tm0BFmRcHIPa",
        "outputId": "7e35bc6b-14e8-4f86-a62b-ea2a0338ece5"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "SelectorPromptTemplate(metadata={'prompt_type': <PromptType.QUESTION_ANSWER: 'text_qa'>}, template_vars=['context_str', 'query_str'], kwargs={}, output_parser=None, template_var_mappings={}, function_mappings={}, default_template=PromptTemplate(metadata={'prompt_type': <PromptType.QUESTION_ANSWER: 'text_qa'>}, template_vars=['context_str', 'query_str'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, template='Context information is below.\\n---------------------\\n{context_str}\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: {query_str}\\nAnswer: '), conditionals=[(<function is_chat_model at 0x7c522deb88b0>, ChatPromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM: 'custom'>}, template_vars=['context_str', 'query_str'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, message_templates=[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, content=\"You are an expert Q&A system that is trusted around the world.\\nAlways answer the query using the provided context information, and not prior knowledge.\\nSome rules to follow:\\n1. Never directly reference the given context in your answer.\\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines.\", additional_kwargs={}), ChatMessage(role=<MessageRole.USER: 'user'>, content='Context information is below.\\n---------------------\\n{context_str}\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: {query_str}\\nAnswer: ', additional_kwargs={})]))])"
            ]
          },
          "metadata": {},
          "execution_count": 66
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "query_engine_gpt4o_mini.get_prompts()['response_synthesizer:text_qa_template'].default_template.template#['default_template']"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        },
        "id": "b707gvQBHy94",
        "outputId": "dc24dd75-ef15-4356-eef8-9449a1363ea6"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'Context information is below.\\n---------------------\\n{context_str}\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: {query_str}\\nAnswer: '"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 77
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "prompt_llamaindex = query_engine_gpt4o_mini.get_prompts()['response_synthesizer:text_qa_template'].default_template.template\n",
        "prompt_llamaindex"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        },
        "id": "yvoHYBThH5B-",
        "outputId": "4b3de0ea-1520-44cd-af16-44eb52073e80"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'Context information is below.\\n---------------------\\n{context_str}\\n---------------------\\nGiven the context information and not prior knowledge, answer the query.\\nQuery: {query_str}\\nAnswer: '"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 78
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### refine_template"
      ],
      "metadata": {
        "id": "MuIK0sCSKVmR"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "I'm not using this template, but I'm showing in case you need it for your own project:"
      ],
      "metadata": {
        "id": "fwE__u9Bq9dN"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "query_engine_gpt4o_mini.get_prompts()['response_synthesizer:refine_template'].default_template.template#['default_template']"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 87
        },
        "id": "-DK_M_epHNCU",
        "outputId": "a46b1984-0999-4a04-83e4-e2471d6374a8"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "\"The original query is as follows: {query_str}\\nWe have provided an existing answer: {existing_answer}\\nWe have the opportunity to refine the existing answer (only if needed) with some more context below.\\n------------\\n{context_msg}\\n------------\\nGiven the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.\\nRefined Answer: \""
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 76
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Vector Store as retriver"
      ],
      "metadata": {
        "id": "GQH5ZMYSKgzR"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "▶ Create a retriver from the vector store index, so we can retrive the context related to our query and use it later in the process:"
      ],
      "metadata": {
        "id": "BV5lLlgxrUJF"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "retriever = vector_index_std.as_retriever(similarity_top_k=3)\n",
        "query_str = \"What was the net income in 2023?\"\n",
        "response = retriever.retrieve(query_str)\n",
        "print(str(response))"
      ],
      "metadata": {
        "id": "xORFsNfIKKfw"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "for res in response:\n",
        "  print(res.node.metadata)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "x6k40iz4M-4Z",
        "outputId": "d9bb8791-50b3-4f19-ebb2-e41f51d8165f"
      },
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{'page_label': '65', 'file_name': 'amzn_2023_10k.pdf', 'file_path': 'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598, 'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}\n",
            "{'page_label': '28', 'file_name': 'amzn_2023_10k.pdf', 'file_path': 'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598, 'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}\n",
            "{'page_label': '67', 'file_name': 'amzn_2023_10k.pdf', 'file_path': 'amzn_2023_10k.pdf', 'file_type': 'application/pdf', 'file_size': 800598, 'creation_date': '2024-10-27', 'last_modified_date': '2024-02-02'}\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Collecting the labled pages:"
      ],
      "metadata": {
        "id": "b9dVDON9rfMl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "page_labels = []\n",
        "for res in response:\n",
        "  if res.node.metadata!={}:\n",
        "    print(res.node.metadata['page_label'])\n",
        "    page_labels.append(res.node.metadata['page_label'])"
      ],
      "metadata": {
        "id": "aUVWIRZkhGaI"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "page_labels"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_iBcu8d9hOSP",
        "outputId": "79f134e9-a32e-47b1-9b4f-d7a6da23322a"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "['65', '28', '67']"
            ]
          },
          "metadata": {},
          "execution_count": 150
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Showing the context:"
      ],
      "metadata": {
        "id": "QjwC8VOJrknk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "context_str=\"\"\n",
        "for resp in response:\n",
        "  text = resp.node.get_text()\n",
        "  print(text)\n",
        "  context_str += text + \" \\n\\n\""
      ],
      "metadata": {
        "id": "gCNJCRfMLAgW"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Simple call to the chat completion to see cached token:"
      ],
      "metadata": {
        "id": "h_TjX0sdWt7v"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### First query:"
      ],
      "metadata": {
        "id": "lxs8AF5Arr89"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "query_str = \"What was the net income in 2023?\""
      ],
      "metadata": {
        "id": "5tcQWHBRM4ZO"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "prompt_llamaindex = f\"\"\"Context information is below.\n",
        "---------------------\n",
        "{context_str}\n",
        "---------------------\n",
        "Given the context information and not prior knowledge, answer the query.\n",
        "Query: {query_str}\n",
        "Answer: \"\"\""
      ],
      "metadata": {
        "id": "lEdmYOZSNHfV"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "from openai import OpenAI\n",
        "client = OpenAI(api_key = OPENAI_API_KEY)\n",
        "\n",
        "completion = client.chat.completions.create(\n",
        "  model=\"gpt-4o-mini\",\n",
        "  messages=[\n",
        "    {\"role\": \"system\", \"content\": \"You are a financial analyst expert.\"},\n",
        "    {\"role\": \"user\", \"content\": prompt_llamaindex}\n",
        "  ]\n",
        ")\n",
        "\n",
        "print(completion.choices[0].message)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "sdBonvRHMt6Q",
        "outputId": "8901f3ff-379d-41bc-c09e-ee749aa97bbe"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "ChatCompletionMessage(content='To calculate the net income for the year 2023, we can start with the income (loss) before income taxes and adjust it by the provision (benefit) for income taxes.\\n\\nFrom the provided information:\\n- Income (loss) before income taxes for 2023: $37,557 million\\n- Provision (benefit) for income taxes, net for 2023: $7,120 million\\n\\nThe formula for net income is:\\n\\n\\\\[ \\\\text{Net Income} = \\\\text{Income (loss) before income taxes} - \\\\text{Provision (benefit) for income taxes} \\\\]\\n\\nSubstituting in the values:\\n\\n\\\\[ \\\\text{Net Income} = 37,557 - 7,120 \\\\]\\n\\nCalculating this gives:\\n\\n\\\\[ \\\\text{Net Income} = 30,437 \\\\]\\n\\nThus, the net income for 2023 was **$30,437 million** or **$30.437 billion**.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(completion.choices[0].message.content)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "JTjIZDhwNuI_",
        "outputId": "1023167a-ad37-42a7-efde-53a3841cea43"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "To calculate the net income for the year 2023, we can start with the income (loss) before income taxes and adjust it by the provision (benefit) for income taxes.\n",
            "\n",
            "From the provided information:\n",
            "- Income (loss) before income taxes for 2023: $37,557 million\n",
            "- Provision (benefit) for income taxes, net for 2023: $7,120 million\n",
            "\n",
            "The formula for net income is:\n",
            "\n",
            "\\[ \\text{Net Income} = \\text{Income (loss) before income taxes} - \\text{Provision (benefit) for income taxes} \\]\n",
            "\n",
            "Substituting in the values:\n",
            "\n",
            "\\[ \\text{Net Income} = 37,557 - 7,120 \\]\n",
            "\n",
            "Calculating this gives:\n",
            "\n",
            "\\[ \\text{Net Income} = 30,437 \\]\n",
            "\n",
            "Thus, the net income for 2023 was **$30,437 million** or **$30.437 billion**.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "completion.usage.prompt_tokens_details #==> cahed_tokens = 0 ==> first call ==> normal"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Z-i68IddOK_W",
        "outputId": "178eaed3-dd04-4283-e463-5a2d2bcbd6c8"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "PromptTokensDetails(audio_tokens=None, cached_tokens=0)"
            ]
          },
          "metadata": {},
          "execution_count": 105
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Second query:"
      ],
      "metadata": {
        "id": "lPgCC2JbrvdE"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "query_str = \"What was the revenue in 2023?\"\n",
        "prompt_llamaindex = f\"\"\"Context information is below.\n",
        "---------------------\n",
        "{context_str}\n",
        "---------------------\n",
        "Given the context information and not prior knowledge, answer the query.\n",
        "Query: {query_str}\n",
        "Answer: \"\"\""
      ],
      "metadata": {
        "id": "AzHaT9EgOHu4"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "completion2 = client.chat.completions.create(\n",
        "  model=\"gpt-4o-mini\",\n",
        "  messages=[\n",
        "    {\"role\": \"system\", \"content\": \"You are a financial analyst expert.\"},\n",
        "    {\"role\": \"user\", \"content\": prompt_llamaindex}\n",
        "  ]\n",
        ")\n",
        "\n",
        "print(completion2.choices[0].message.content)\n"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "hSRrqDJKOgJp",
        "outputId": "9d7b6933-b8aa-487a-ecea-3bf9f8a0cfaa"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The provided context information does not include details about revenue for the year 2023. Therefore, I cannot determine the revenue for that year based on the information given. If additional data on revenue is available, it would be necessary to review that information to provide an answer.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "completion2.usage.prompt_tokens_details #==> cahed_tokens = 2688 ==> second call. So we have used 2688 tokens."
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "mEX1-dYEOkl_",
        "outputId": "3f43190c-dd51-46d1-dece-5aa1786f9945"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "PromptTokensDetails(audio_tokens=None, cached_tokens=2688)"
            ]
          },
          "metadata": {},
          "execution_count": 109
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "completion2.usage.prompt_tokens"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "uXtgWZE6fx1a",
        "outputId": "303e9144-e316-40fb-9e54-3ed56b43facd"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "2851"
            ]
          },
          "metadata": {},
          "execution_count": 143
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# All together: Caching Tokens"
      ],
      "metadata": {
        "id": "6gAjakiSOp4Q"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import tiktoken"
      ],
      "metadata": {
        "id": "NAEFZjFDSEkg"
      },
      "execution_count": 11,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "MODEL = \"gpt-4o-mini\"\n",
        "encoding = tiktoken.encoding_for_model(MODEL)\n",
        "print(encoding)\n",
        "\n",
        "from openai import OpenAI\n",
        "client = OpenAI(api_key = OPENAI_API_KEY)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "YFMtLRaJSGMw",
        "outputId": "a880c1d1-585e-43a7-dae9-813e4eac2b30"
      },
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "<Encoding 'o200k_base'>\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "def get_retrieved_context(query_str,retriver):\n",
        "  response = retriever.retrieve(query_str)\n",
        "  context_str=\"\"\n",
        "  for resp in response:\n",
        "    text = resp.node.get_text()\n",
        "    context_str += text + \" \\n\\n\"\n",
        "\n",
        "  page_labels = []\n",
        "  for res in response:\n",
        "    if res.node.metadata!={}:\n",
        "      # print(res.node.metadata['page_label'])\n",
        "      page_labels.append(res.node.metadata['page_label'])\n",
        "  return context_str, page_labels\n",
        "\n",
        "def get_template(query_str,context_str):\n",
        "  prompt_llamaindex = f\"\"\"Context information is below.\n",
        "  ---------------------\n",
        "  {context_str}\n",
        "  ---------------------\n",
        "  Given the context information and not prior knowledge, answer the query.\n",
        "  Query: {query_str}\n",
        "  Answer: \"\"\"\n",
        "  return prompt_llamaindex\n",
        "\n",
        "def call_gpt_4o(prompt):\n",
        "  completion = client.chat.completions.create(\n",
        "  model=MODEL,\n",
        "  messages=[\n",
        "    {\"role\": \"system\", \"content\": \"You are a financial analyst expert.\"},\n",
        "    {\"role\": \"user\", \"content\": prompt}\n",
        "    ]\n",
        "  )\n",
        "\n",
        "  llm_answer = completion.choices[0].message.content\n",
        "  cached_tokens_nbr = completion.usage.prompt_tokens_details.cached_tokens\n",
        "  # prompt_input_nbr_tokens = completion.usage.prompt_tokens\n",
        "  return llm_answer, cached_tokens_nbr\n",
        "\n",
        "def compute_nb_tokens(text):\n",
        "  tokens_integer=encoding.encode(text)\n",
        "  return len(tokens_integer)\n",
        "\n",
        "def get_final_answer(query_str,retriever):\n",
        "  context_str, page_labels = get_retrieved_context(query_str,retriever)\n",
        "  prompt_llamaindex = get_template(query_str,context_str)\n",
        "  llm_answer, cached_tokens_nbr = call_gpt_4o(prompt_llamaindex)\n",
        "  prompt_nbr_tokens = compute_nb_tokens(prompt_llamaindex) #You can also use: completion.usage.prompt_tokens in call_gpt_4o method\n",
        "  return llm_answer, cached_tokens_nbr, prompt_nbr_tokens, page_labels"
      ],
      "metadata": {
        "id": "xg0YfQmKOsQa"
      },
      "execution_count": 15,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "**With SimpleDirectory Retriver**"
      ],
      "metadata": {
        "id": "jAEhVZIsTPeQ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "In the following, I'll be asking different questions to see if the prompt caching is enabled:"
      ],
      "metadata": {
        "id": "WmRDB-1lQcSo"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Query 1:"
      ],
      "metadata": {
        "id": "Ty-LZ6SooUcF"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "First call, we'll see 0 caching tokens:"
      ],
      "metadata": {
        "id": "6lJG0jlGobW_"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "queries_list = [\"What was the net income in 2023?\",\"What was the revenue in 2023?\", \\\n",
        "                \"What are the operating income in 2022?\", \"What are the operating expenses in 2021?\"]\n",
        "for query in queries_list :\n",
        "  resp, cached_tokens, prompt_nbr_tokens, page_labels = get_final_answer(query,retriever)\n",
        "  print(f\"query:\\n{query}\"+\"\\n\\n\")\n",
        "\n",
        "  print(f\"response:\\n{resp}\" +\"\\n\\n\")\n",
        "  print(f\"nbr_tokens in the prompt = {prompt_nbr_tokens}\" +\"\\n\")\n",
        "  print(f\"cached_tokens = {cached_tokens}\" +\"\\n\")\n",
        "  print(f\"page_labels = {page_labels}\" +\"\\n\")\n",
        "  print(\"--\"*50)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "dJF5yN4viLB_",
        "outputId": "c437201d-7c52-4be6-d153-ffadf50aef01"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "query:\n",
            "What was the net income in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "To calculate the net income for the year 2023, we need to start with the income before income taxes and subtract the provision (benefit) for income taxes. \n",
            "\n",
            "From the provided information for the year ended December 31, 2023:\n",
            "\n",
            "- Income before income taxes: $37,557 million\n",
            "- Provision for income taxes: $7,120 million\n",
            "\n",
            "Now, we can calculate the net income as follows:\n",
            "\n",
            "Net Income = Income before income taxes - Provision for income taxes  \n",
            "Net Income = $37,557 million - $7,120 million  \n",
            "Net Income = $30,437 million\n",
            "\n",
            "Therefore, the net income in 2023 was **$30,437 million**.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2840\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['65', '28', '67']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What was the revenue in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "The context provided does not explicitly state the total revenue for the year 2023. To determine the revenue, we would typically look for specified financial results in the company's income statement or performance reports, which is missing in the provided information. Based on this context alone, I cannot provide a specific figure for the revenue in 2023.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2599\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['51', '67', '66']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the operating income in 2022?\n",
            "\n",
            "\n",
            "response:\n",
            "The operating income in 2022 was $12.2 billion.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2079\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['25', '26', '28']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the operating expenses in 2021?\n",
            "\n",
            "\n",
            "response:\n",
            "The operating expenses in 2021 were not provided in the context information. Only the operating expenses for the years 2022 and 2023 were included. Therefore, based on the information available, we cannot determine the operating expenses for 2021.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2121\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['26', '55', '37']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Query 2"
      ],
      "metadata": {
        "id": "3Cz9Wl8boude"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "In the second call, caching tokens will appear because by modifying only the years in the queries, this leads to retrieve the same (almost) context, thus leading to utilize cached tokens:"
      ],
      "metadata": {
        "id": "HYNw155cot4X"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "queries_list = [\"What was the net income in 2023?\",\"What was the revenue in 2023?\", \\\n",
        "                \"What are the operating income in 2022?\", \"What are the operating expenses in 2021?\"]\n",
        "for query in queries_list :\n",
        "  resp, cached_tokens, prompt_nbr_tokens, page_labels = get_final_answer(query,retriever)\n",
        "  print(f\"query:\\n{query}\"+\"\\n\\n\")\n",
        "\n",
        "  print(f\"response:\\n{resp}\" +\"\\n\\n\")\n",
        "  print(f\"nbr_tokens in the prompt = {prompt_nbr_tokens}\" +\"\\n\")\n",
        "  print(f\"cached_tokens = {cached_tokens}\" +\"\\n\")\n",
        "  print(f\"page_labels = {page_labels}\" +\"\\n\")\n",
        "  print(\"--\"*50)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "OBiS1GPmn9uO",
        "outputId": "dd9e888c-429e-4de9-e70a-52be128221a6"
      },
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "query:\n",
            "What was the net income in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "To calculate the net income for 2023, we need to consider the income (loss) before income taxes and the provision for income taxes.\n",
            "\n",
            "From the data provided:\n",
            "- Income (loss) before income taxes in 2023: $37,557 million\n",
            "- Provision (benefit) for income taxes in 2023: $7,120 million\n",
            "\n",
            "Net income can be calculated as follows:\n",
            "\n",
            "Net Income = Income (loss) before income taxes - Provision for income taxes\n",
            "Net Income = $37,557 million - $7,120 million\n",
            "Net Income = $30,437 million\n",
            "\n",
            "Therefore, the net income in 2023 was **$30,437 million**.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2840\n",
            "\n",
            "cached_tokens = 2688\n",
            "\n",
            "page_labels = ['65', '28', '67']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What was the revenue in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "The provided context does not explicitly state the total revenue for the year 2023. However, it mentions that $12.4 billion of unearned revenue was recognized as revenue during the year ended December 31, 2023. To determine the total revenue for 2023, additional information regarding other revenue streams or total revenue figures for the year would be needed, which is not included in the provided context. Therefore, based solely on the information present, I cannot provide the total revenue for 2023.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2599\n",
            "\n",
            "cached_tokens = 2432\n",
            "\n",
            "page_labels = ['51', '67', '66']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the operating income in 2022?\n",
            "\n",
            "\n",
            "response:\n",
            "The operating income in 2022 for each segment is as follows (in millions):\n",
            "\n",
            "- North America: $(2,847)\n",
            "- International: $(7,746)\n",
            "- AWS: $22,841\n",
            "\n",
            "The consolidated operating income for the entire company in 2022 was $12,248 million.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2079\n",
            "\n",
            "cached_tokens = 1920\n",
            "\n",
            "page_labels = ['25', '26', '28']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the operating expenses in 2021?\n",
            "\n",
            "\n",
            "response:\n",
            "The operating expenses for the year ended December 31, 2021, can be derived from the information provided. However, the specific breakdown of operating expenses for 2021 is not included in the context you provided. \n",
            "\n",
            "The operating expenses mentioned for 2022 and 2023 are as follows:\n",
            "\n",
            "- For 2022, the total operating expenses are $501,735 million.\n",
            "- For 2023, the total operating expenses are $537,933 million.\n",
            "\n",
            "To answer your query accurately, we would need the specific operating expenses for 2021, which are not part of the provided context. \n",
            "\n",
            "If you have any additional information on the operating expenses for 2021 or need further assistance, please let me know!\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2121\n",
            "\n",
            "cached_tokens = 1920\n",
            "\n",
            "page_labels = ['26', '55', '37']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Query 3"
      ],
      "metadata": {
        "id": "0xiz3043pHx_"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "In this query, I'm asking completely different questions than Query 1 and Query 2: First time the context is retrieved, thus the caching tokens is 0:"
      ],
      "metadata": {
        "id": "M3DbLjrupKSv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "queries_list = [\"What are the total assets in 2022?\",\"What are the current liabilities in 2023?\"]\n",
        "for query in queries_list :\n",
        "  resp, cached_tokens, prompt_nbr_tokens, page_labels = get_final_answer(query,retriever)\n",
        "  print(f\"query:\\n{query}\"+\"\\n\\n\")\n",
        "\n",
        "  print(f\"response:\\n{resp}\" +\"\\n\\n\")\n",
        "  print(f\"nbr_tokens in the prompt = {prompt_nbr_tokens}\" +\"\\n\")\n",
        "  print(f\"cached_tokens = {cached_tokens}\" +\"\\n\")\n",
        "  print(f\"page_labels = {page_labels}\" +\"\\n\")\n",
        "  print(\"--\"*50)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "I4b2nzEsiXjv",
        "outputId": "2557b1d4-6ebb-4728-abed-4d53a8d77d79"
      },
      "execution_count": 18,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "query:\n",
            "What are the total assets in 2022?\n",
            "\n",
            "\n",
            "response:\n",
            "The total assets in 2022 are $462,675 million.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2634\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['70', '40', '23']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the current liabilities in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "The current liabilities in 2023 are $164,917 million.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2359\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['67', '40', '66']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Query 4"
      ],
      "metadata": {
        "id": "FniIVVLHPipI"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Even if I modified only the years, the retrieved context here (page_label) does not follow the same order than the query before (Query 3), leading to different context, thus to no caching tokens."
      ],
      "metadata": {
        "id": "qD2A8NQ0PJKg"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "queries_list = [\"What are the total assets in 2023?\",\"What are the current liabilities in 2022?\"]\n",
        "for query in queries_list :\n",
        "  resp, cached_tokens, prompt_nbr_tokens, page_labels = get_final_answer(query,retriever)\n",
        "  print(f\"query:\\n{query}\"+\"\\n\\n\")\n",
        "\n",
        "  print(f\"response:\\n{resp}\" +\"\\n\\n\")\n",
        "  print(f\"nbr_tokens in the prompt = {prompt_nbr_tokens}\" +\"\\n\")\n",
        "  print(f\"cached_tokens = {cached_tokens}\" +\"\\n\")\n",
        "  print(f\"page_labels = {page_labels}\" +\"\\n\")\n",
        "  print(\"--\"*50)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yRHyQFr_pWIt",
        "outputId": "05630f79-29b4-4891-ea5b-e53df7c85b26"
      },
      "execution_count": 19,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "query:\n",
            "What are the total assets in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "The total assets in 2023 amount to $527,854 million.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2159\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['70', '66', '40']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the current liabilities in 2022?\n",
            "\n",
            "\n",
            "response:\n",
            "The current liabilities for the year ended December 31, 2022, were $155,393 million.\n",
            "\n",
            "\n",
            "nbr_tokens in the prompt = 2320\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "page_labels = ['67', '63', '40']\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "queries_list = [\"What was the net income in 2023?\",\"What was the revenue in 2023?\", \\\n",
        "                \"What are the operating income in 2022?\", \"What are the operating expenses in 2021?\"]\n",
        "\n",
        "for query in queries_list :\n",
        "  resp, cached_tokens, prompt_nbr_tokens, page_labels = get_final_answer(query,retriever)\n",
        "  print(f\"query:\\n{query}\"+\"\\n\\n\")\n",
        "\n",
        "  print(f\"response:\\n{resp}\" +\"\\n\\n\")\n",
        "  print(f\"nbr_tokens in the prompt = {prompt_nbr_tokens}\" +\"\\n\")\n",
        "  print(f\"cached_tokens = {cached_tokens}\" +\"\\n\")\n",
        "  print(f\"page_labels = {page_labels}\" +\"\\n\")\n",
        "  print(\"--\"*50)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "qJRLEk7AP4yE",
        "outputId": "91e47b64-3271-40c0-bb31-9a7f4bc62616"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "query:\n",
            "What was the net income in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "To determine the net income for the year 2023, we can use the provided income before income taxes and the provision (benefit) for income taxes.\n",
            "\n",
            "Given:\n",
            "\n",
            "- Income before income taxes for 2023: $37,557 million\n",
            "- Provision for income taxes for 2023: $7,120 million\n",
            "\n",
            "Net income is calculated as follows:\n",
            "\n",
            "Net Income = Income before income taxes - Provision for income taxes\n",
            "\n",
            "Thus:\n",
            "\n",
            "Net Income = $37,557 million - $7,120 million = $30,437 million\n",
            "\n",
            "Therefore, the net income in 2023 was $30,437 million.\n",
            "\n",
            "\n",
            "cached_tokens = 2688\n",
            "\n",
            "nbr_tokens in the prompt = 2840\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What was the revenue in 2023?\n",
            "\n",
            "\n",
            "response:\n",
            "The context information provided does not specify the total revenue for 2023. However, it does mention that $12.4 billion of unearned revenue was recognized as revenue during the year ended December 31, 2023. To obtain the total revenue for 2023, we would need additional information or financial statements detailing the overall revenue figure for that year.\n",
            "\n",
            "\n",
            "cached_tokens = 0\n",
            "\n",
            "nbr_tokens in the prompt = 2599\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the operating income in 2022?\n",
            "\n",
            "\n",
            "response:\n",
            "The operating income in 2022 was $12.2 billion.\n",
            "\n",
            "\n",
            "cached_tokens = 1920\n",
            "\n",
            "nbr_tokens in the prompt = 2079\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n",
            "query:\n",
            "What are the operating expenses in 2021?\n",
            "\n",
            "\n",
            "response:\n",
            "The operating expenses for the year ended December 31, 2021, are not explicitly listed in the provided information. However, a breakdown of the total operating expenses and specific categories for the years 2022 and 2023 are provided. To answer the query, we would need the data for 2021, which is not included in the context. Therefore, we cannot provide the operating expenses for 2021 based on the information available.\n",
            "\n",
            "\n",
            "cached_tokens = 1920\n",
            "\n",
            "nbr_tokens in the prompt = 2121\n",
            "\n",
            "----------------------------------------------------------------------------------------------------\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Key Takeaways:"
      ],
      "metadata": {
        "id": "ZrEhadEJSpYw"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "▶ **Can prompt caching work in RAG apps? Key Takeaways:**\n",
        "\n",
        "- It depends on the prompt. If it begins with lengthy, static instructions or examples (e.g., few-shot learning), caching can be effective.\n",
        "- For prompts with brief instructions followed by dynamic retrieved context and user-specific queries, caching is unlikely, as the context changes per query (for Small RAG apps, not widely used).\n",
        "- Caching could work if users repeatedly ask similar questions that pull the same context.\n",
        "- For shared RAG systems, especially in organizations, caching frequent queries can help reduce latency and costs."
      ],
      "metadata": {
        "id": "fmP_Pa8aSrco"
      }
    }
  ]
}